-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trm 30823 UniParc proteome redundant fasta #249
base: main
Are you sure you want to change the base?
Conversation
# Conflicts: # controlled-vocabulary/src/test/resources/xdb/dr_ord # core-parser/src/main/java/org/uniprot/core/parser/fasta/uniparc/UniParcFastaParser.java # core-parser/src/test/java/org/uniprot/core/parser/fasta/UniParcFastaParserTest.java
# Conflicts: # xml-parser/src/main/java/org/uniprot/core/xml/uniparc/UniParcDBCrossReferenceConverter.java
I am going to merge with main before review. |
VECTORBASE(3500, "VectorBase", false, false), | ||
VEGA(3600, "VEGA", true, false, "https://vega.sanger.ac.uk/id/%id"), | ||
WORMBASE_PARASITE(3700, "WBParaSite", true, true, "https://parasite.wormbase.org/id/%id"), | ||
WORMBASE(3800, "WormBase", true, true,"https://wormbase.org/db/seq/protein?name=%id;class=CDS"); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we know if a database is source? Could you please add source of this information in this class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jie provided the list of databases that are sources in UniParc.
|
||
private static final Set<UniParcDatabase> uniProtDatabases = Set.of( | ||
UniParcDatabase.SWISSPROT, UniParcDatabase.TREMBL, UniParcDatabase.SWISSPROT_VARSPLIC); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use uppercase for const.
@@ -11,6 +11,8 @@ | |||
*/ | |||
public interface UniParcCrossReference extends CrossReference<UniParcDatabase> { | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add formula and an example how we construct source. :: and eg. ABC01415:UP000005640:Chromosome 1
hasActive = true; | ||
} | ||
} else { | ||
sourceXrefs.add(xref); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we are assuming that database type other than uniProtDatabases
are source xref. But in UniParcDatabase,java i can see other non-source xrefs e.g. EPO, FLYbase. Will they not be added as sourceXrefs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are doing a filter in the REST side (we filter proteome Id). I can add an else if for sources if you prefer
@@ -0,0 +1,134 @@ | |||
package org.uniprot.core.parser.fasta.uniparc; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add few comments in this class.
} | ||
|
||
private static StringBuilder getFastaHeader(List<UniParcCrossReference> xrefs, boolean hasActive, String id, String proteomeId, boolean isSource) { | ||
Set<String> proteinName = new LinkedHashSet<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use plural name for collection proteinNames
and components
if(!component.isEmpty()){ | ||
sb.append(":").append(String.join("|", component)); | ||
} | ||
return sb; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can move "|" to a constant.
if(!sourceIds.isEmpty()){ | ||
sb.append(" SS=").append(String.join("|", sourceIds)); | ||
} | ||
sb.append(" PC=").append(proteomeId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
proteomeId can be null
component.add(ids[2]); | ||
} | ||
} | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extract the logic for processing xref.getProperties() into a helper method, simplifying the loop body.
We should keep the lambda small. we can extract the for(source:sources)
logic another method.
@@ -47,6 +49,9 @@ public static void populateUniParcCrossReferenceBuilder(String propertyType, Str | |||
case PROPERTY_UNIPROTKB_ACCESSION: | |||
builder.propertiesAdd(PROPERTY_UNIPROTKB_ACCESSION, propertyValue); | |||
break; | |||
case PROPERTY_SOURCES: | |||
builder.propertiesAdd(PROPERTY_SOURCES, propertyValue); | |||
break; | |||
default: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have not covered this case in test case.
No description provided.